ci: pilot-migrate clippy job to smithy self-hosted runners by avrabe · Pull Request #201 · pulseengine/spar

avrabe · 2026-05-03T06:01:38Z

Summary

First pilot migration of a CI job from GitHub-hosted to the
pulseengine self-hosted fleet (hetzner-private runner group on
pulseengine-ci-01). Scope deliberately small: just the clippy
job, switched to [self-hosted, linux, x64, rust-cpu]. Other jobs
(fmt, test, proofs) stay on ubuntu-latest.

Rationale

Spar's recent CI runs show 400-600 min completion times, much of
which is GitHub-hosted runner queueing on the org-free tier
(20-concurrent cap).
Clippy is meaningful compile work (good sccache integration test)
but bounded — failure doesn't block format checks or tests.
No sudo, apt, or container needed → no friction with our
rootless runner setup.
Spar already pins nightly via dtolnay/rust-toolchain, so the
toolchain version matches between hosted and self-hosted.

Test plan

CI run completes — clippy job lands on a rust-cpu runner (1 of 5/6/7) within seconds (no GitHub queue)
Compile succeeds end-to-end with no clippy warnings
Other jobs (fmt, test) still run on ubuntu-latest as before
Second push to this branch should be much faster on clippy thanks to sccache hit

Rollback

Revert this commit. runs-on: flips back to ubuntu-latest and
the next run uses GitHub-hosted compute.

Follow-ups (if green)

Migrate fmt and test next (separate PRs).
Add a heavy-quality workflow (mutants-weekly.yml) that targets
lean-mem runners, separate from gating CI.

Switches just the clippy job from ubuntu-latest to [self-hosted, linux, x64, rust-cpu] — one of the three rust-cpu runners on pulseengine-ci-01 (hetzner-private group). Other jobs (fmt, test) stay on ubuntu-latest for now; once we have a few green clippy runs and timing data, the rest can follow. Why clippy first: - meaningful compile work (good sccache test) - bounded scope — failure doesn't block fmt or test - no sudo, apt, or container needed - spar already tracks nightly via dtolnay/rust-toolchain so the toolchain matches between hosted and self-hosted If this PR's clippy job goes red on the self-hosted runner but passes locally / on hosted, that's a smithy bug, not a code bug.

The previous clippy run on the self-hosted runner failed at highs-sys build because cmake wasn't on the host. smithy main now ships the common Rust build-dep set (cmake, clang, lld, perl, m4, protobuf-compiler, libclang-dev, zlib1g-dev). Pushing an empty commit to re-trigger CI; clippy should now finish on rust-cpu.

Builds on the proven clippy migration (PR description, original commit on this branch). Two separate concerns: 1) ci.yml — broaden the migration Migrate every gating job that doesn't need infra we don't have on the smithy host. Two stay on ubuntu-latest with explicit comments explaining why; everything else now targets the matching smithy runner class: rust-cpu (12G MemoryHigh) clippy, test, bench-smoke, coverage, proptest, fuzz-smoke, rivet-validate lean-mem (24G MemoryHigh) miri, mutants light (4G MemoryHigh) fmt, audit, deny, supply-chain ubuntu-latest (kept) bazel-test (no Bazel on host), kani (kani-verifier bundles CBMC, ~100 MB install — not worth pre- provisioning until kani sees more use) The lean-mem class for miri / mutants is deliberate: both are RAM-aggressive (Miri's borrow tracker, mutants' parallel cargo invocations). The 24G MemoryHigh ceiling on smithy lean-mem runners is comfortably above the 12G rust-cpu cap. 2) mutants-weekly.yml — new heavy-quality workflow Counterpart to the gating `mutants:` job in ci.yml. Different operational pattern (smithy DD-pattern for "heavy quality"): - schedule: 02:00 UTC every Sunday + workflow_dispatch on demand - runs-on: lean-mem (24G), timeout-minutes: 720 - concurrency.cancel-in-progress: false (never cancel a quality run) - workflow_dispatch inputs: `shard` (default 0/8 for sanity, "all" for the full ~hours pass) + `packages` (space-separated -p list) - results land in GITHUB_STEP_SUMMARY (markdown table of missed/caught/timeout/unviable) plus an uploaded artefact with 90-day retention - no PR red lights; no auto-Issue filing yet (that's a follow-up once the report shape stabilises) This is the second-pattern pilot the smithy fleet was sized for — the lean-mem runners have been idle since registration; this puts them on the work they were labelled for.

GitHub limits workflow_dispatch and schedule triggers to workflows that already exist on the default branch. Adding a path-filtered push trigger lets us exercise the workflow on this PR before merge. The push: block carries a TEMPORARY marker; remove it before merge.

Prior run hit 'Permission denied (os error 13)' on .d files in target/. Direct file-write tests as the runner user succeed; the files are owned correctly with mode 640. Suspect: stale state left by a cancelled run interacting badly with concurrent jobs landing on the same runner via cache restoration. Clearing all runner _work and the shared sccache to bisect: if a clean run also fails, it's not stale state.

Disabled RUSTC_WRAPPER in runner env (smithy commit 65e57a2); runners restarted to pick up the new environment. bpftrace running on host capturing every openat returning EACCES with PID/UID/comm/filename. Pushing this empty commit to fire CI.

codecov · 2026-05-03T08:55:32Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

The action bundles an older cargo-audit that can't parse CVSS 4.0 advisories like RUSTSEC-2026-0037 and exits non-zero on the parse error before evaluating spar's Cargo.lock. cargo-audit is pre- installed on smithy at v0.21.2 (toolchains role) which handles CVSS 4.0 fine. Same effect (audit blocks PRs on advisory hits) without the wrapper.

Smithy main now ships: - subuid/subgid for runner1..8 (Cargo Deny rootless container fix) - CARGO_HOME/bin on the runner env PATH (Rivet validate fix) - always-on bpftrace EACCES tracing (smithy-trace-eacces.service) Plus this branch carries: - cargo audit invoked directly (replaces broken rustsec/audit-check) All runners restarted with new env. This commit fires fresh CI.

…roken) Two adjustments after the smithy subuid + PATH fixes landed: 1. cargo-deny: drop EmbarkStudios/cargo-deny-action@v2 (which runs in a rootless container) in favour of direct `cargo deny check`. Smithy has cargo-deny installed (toolchains role v0.16.4). The container action fails on our hardened runner systemd unit: newuidmap is setuid but NoNewPrivileges=true blocks the escalation, so the rootless namespace can't be set up. Going direct sidesteps the entire interaction; we'd otherwise need to weaken the runner hardening for this single workflow. 2. audit: back to ubuntu-latest temporarily. Smithy ships cargo-audit v0.21.2 which still rejects RUSTSEC-2026-0037 ('unsupported CVSS version: 4.0') even though upstream rustsec 0.30+ supports CVSS 4.0. v0.22.1 would fix it but that build trips on our sccache-on-cc setup (aws-lc-sys C compile through sccache fails). Move back once smithy ships an upgraded cargo-audit.

Surfaced when running `cargo deny check` directly with the toolchains-role-installed cargo-deny v0.16.4 on smithy: error[deprecated]: this key has been removed, see EmbarkStudios/cargo-deny#611 The yanked + licenses + bans + sources sections still gate normally. Unmaintained-crate detection moved out of the static config in newer cargo-deny; revisit if/when we want to re-enable that signal.

cargo-deny and cargo-audit share the same rustsec advisory parser. Both fail at the same point on RUSTSEC-2026-0037 because the embedded rustsec rejects CVSS 4.0 strings. The audit job (on hosted) still covers vulnerability matching; cargo-deny here keeps gating bans, licenses, and sources, which is what it actually adds beyond audit. Drop the workaround once smithy ships an upgraded rustsec parser (tracked alongside the cargo-audit upgrade).

avrabe added 6 commits May 3, 2026 07:54

avrabe added 5 commits May 3, 2026 11:32

avrabe enabled auto-merge (squash) May 3, 2026 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: pilot-migrate clippy job to smithy self-hosted runners#201

ci: pilot-migrate clippy job to smithy self-hosted runners#201
avrabe wants to merge 11 commits intomainfrom
smithy-clippy-pilot

avrabe commented May 3, 2026

Uh oh!

codecov Bot commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avrabe commented May 3, 2026

Summary

Rationale

Test plan

Rollback

Follow-ups (if green)

Uh oh!

codecov Bot commented May 3, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant